Conversation
added 16 commits
May 6, 2020 10:00
The column parameter is useful for normalization, binning, and other operations. Adding to the simple operations to keep the APIs consistent
Operations will be used as default parameters in features. Make it an import allows them to be used without changing file ordering.
NoOp operation is cleaner than checking for None. It's also a nice utility to match the interface with an operation that does nothing.
In the local tests, the feature name was inconsistent with the source column name.
MinMax will be used as a scaling operation for Feature.
Forgot to do it before last commit.
This makes the column have a different mean and median value to help testing.
The Sqrt object shouldn't be used.
These operations will be used to fill missing values.
These two operations provide a basis for outlier handling.
Truncate is slightly clearer and more consistent in the numeric feature definition
Normalizing, truncating, etc. are all operations. This can lead to confusion where-as transform is clearer.
This allows a variety of typical feature engineering techniques to be appied using StreamSQL.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Numeric features allow a user to take their source tables and easily apply a variety of feature engineering techniques to it. It effectively separates general data engineering tasks (ex. join actions and users tables) from data science-oriented ones (ex. fill missing values with the mean value, truncate outliers, and min-max scale the column)